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Abstract 



A divide-and-conquer based approach for computing the Moore-Penrose pseudo-inverse of the combinatorial 

Laplacian matrix (L+) of a simple, undirected graph is proposed. The nature of the underlying sub-problems 

^^ , is studied in detail by means of an elegant interplay between L+ and the effective resistance distance (fi). 

^^ ' Closed forms are provided for a novel two-stage process that helps compute the pseudo-inverse incrementally. 

OO I Analogous scalar forms are obtained for the converse case, that of structural regress, which entails the 

breaking up of a graph into disjoint components through successive edge deletions. The scalar forms in both 

cases, show absolute element-wise independence at all stages, thus suggesting potential parallelizability. 

Analytical and experimental results are presented for dynamic (time-evolving) graphs as well as large graphs 

in general (representing real- world networks). An order of magnitude reduction in computational time is 

ly^ ' achieved for dynamic graphs; while in the general case, our approach performs better in practice than the 

O I standard methods, even though the worst case theoretical complexities may remain the same: an important 

contribution with consequences to the study of online social networks. 
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(^ • 1. Introduction 

cn: 

Cn ' The combinatorial Laplacian matrix of a graph finds use in various aspects of structural analysis [J, 

Tlj" ! UM, llSl I2J, I25I, |28|, |30|, [STl |38J, |39|, |4l|, |50|, |3J| . The eigen spectrum of the Laplacian determines significant 



^^ ' to pologic al characteristics of the graph, such as minimal cuts, clustering and the number of spanning trees 

^^ ! [Sl, I2J, I25I, yj] . Likewise, the Moore-Penrose pseudo-inverse and the sub-matrix inverses of the Laplacian have 



evoked great interest in recent times. Their applications span fields as diverse as probability and mathemat- 
ical chemistr y, c ollaborative recommendation systems and social networks, epidemiology and infrastructure 
planning 28|, |3l|, |32|, |33|, |40|, |46|, |55[ . A brief discussion of the specific applications is provided for reference in 



a subsequent section (c.f. ^. Alas, despite such versatility, the pseudo- inverse and the sub-matrix inverses 
Cd ■ of the Laplacian suffer a practical handicap. These matrices are notoriously expensive to compute. The 

standard matrix factorization and inversion based methods employed to compute them [8|, |55|, incur an 
0{n^) computational time, n being the order of the graph (number of vertices in the graph). This clearly 
impedes their utility particularly when the graphs are either dynamic, i.e. changing with time, or simply 
of large orders, i.e. have millions of nodes. Online social networks {OSN), typically represented as graphs, 
qualify on both counts. With time, the number of users as well as the relationships between them changes, 
thus requiring regular re-computations. As for size, a popular OSN, such as Facebook and Youtube, may 
easily have hundreds of millions of users. An 0{n^) cost, therefore, is clearly undesirable and an approach 
for incremental updates is imperative, particularly given that such changes, in most cases, may be local in 
nature. 
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In this work, we provide a novel divide-and-conquer based approach for computing the Moore-Penrose 
pseudo-inverse of the Laplacian for an undirected graph which, in turn, determines all of its sub-matrix 
inverses as well. The divide operation in our approach entails determining an arbitrary connected bi-partition 
of the graph G{V, E) — a cut of the graph that is made up of exactly two connected sub-graphs (say 
Gi(Vi,£'i) and 6*2(^2,^2)) — by deleting n edges from it. As Gi{Vi,Ei) and 6*2(^2, i?2) are simple and 
connected themselves, the pseudo-inverse of their Laplacians, when computed, constitute solutions to two 
independent sub-problems. Better still, they can be computed in parallel (given two machines instead 
of one). In the conquer step, we recombine these solutions in an iterative manner by re- introducing the 
edges in the cut set one at a time to reconstruct the original graph and obtain the overall pseudo-inverse 
in the process. Note that, this process yields a sequence of intermediate spanning sub-graphs of G, (say 
{Gi,G2} ^ G3 ^ G4 ^ ... ^ G«+2), where G«+2 = G{V,E). The first transition {Gi,G2} ^ G3 
represents a point of singularity in our method whence the disjoint components {Gi,G2} get connected 
through a bridge edge to yield G3, a sub-graph with exactly one component. We call this stage the first join 
in our process. Post the first join, all intermediate sub-graphs from G4 to Gk+2 are obtained by introducing 
an edge in a sub-graph that is already connected. We call this edge firing (details in a subsequent section). 
We then show that the pseudo-inverse of the Laplacian for any intermediate sub-graph in this sequence is 
determined entirely in terms of the pseudo-inverse of the Laplacian for its predecessor. Our results, presented 
in an element-wise scalar form, reveal several interesting properties of the sub-problems. First and foremost, 
if n be the order of the graph G{V,E), then the cost incurred at each intermediate stage is 0{n'^) if the 
solution to the sub-problems for the immediate predecessor is known. Therefore, the cost of computing the 
pseudo-inverse for G{V,E) is 0(k • n^), if the pseudo-inverses for Gi{Vi,Ei) and G2(V2,£^2) are known. 
Secondly, using these forms, each element of the pseudo-inverse for an intermediate graph can be computed 
independent of the other elements. Hence, given multiple machines, the overall computational time is reduced 
further through parallelization. Moreover, we obtain similar closed form solutions for the case of structural 
regress of the graph, i.e. when vertices or edges are deleted from it. A straightforward consequence is that 
the pseudo-inverses for dynamic time-evolving graphs, such as OSNs, can now be updated when a node 
joins or leaves the network or an edge (a relationship) appears/disappears in it, at an Gin?) cost overall (as 
K << n). 

Last but not least, we use these insights to compute the pseudo-inverses of the Laplacians of large 
real-world networks from the domain of online social networks. Real-world networks, and social ones in 
particular, are reported to have some notable characteristics such as edge sparsity, power-law and scale-free 
degree distributions [71, y, 122], small-world characteristics ^54] etc. Given these properties, we note that 
interesting algorithms (heuristics) can be developed for fast and parallel computations for the general case 
based on our divide-and-conquer strategy. Thus, even though the theoretical worst case costs stay at 0{n'^) 
for general graphs, the practical gains are significant enough to warrant attention. We discuss both analytical 
and experimental aspects of these in detail in the subsequent sections. 

The rest of the paper is organized into the following sections: we begin by introducing the preliminaries 
of our work — the pseudo-inverse and the sub-matrix inverses of the Laplacian along with their properties; 
and the interplay of the pseudo-inverse and the effective resistance distance — in ^ In ^ we describe our 
divide-and-conquer strategy involving connected bi-partitions and the two-stage process for computing the 
Moore-Penrose pseudo-inverse of the Laplacian. Relevant scalar forms are presented in each case. In 21 
we establish the same closed forms for a graph in regress i.e. deleting edges one at a time until the graph 
breaks into two. We then apply the divide-and-conquer methodology to compute the pseudo-inverses for 
dynamically changing graphs as well as those of real world networks in [J5] In Sj6l we briefly overview related 
literature discussing specific application scenarios. The paper is finally concluded in fjTlwith a summary of 
results and a discussion of potential future works. 

2. The Laplacian, Sub-Matrix Inverses and A Distance Fiinction 

In this section, we provide a brief introduction to the set of matrices studied in this work, namely, 
the combinatorial Laplacian of a graph (L), its Moore-Penrose pseudo- inverse (L+) and the set of sub- 



matrix inverses of L fi j2.ip . We then demonstrate how ah the sub- matrix inverses of the Laplacian can be 
computed in terms of the pseudo-inverse in i)2.2l Finally, in ^2.3\ we describe the relationship between the 
effective resistance distance, a Euclidean metric, and the elements of the Moore-Penrose pseudo-inverse of 
the Laplacian — an equivalence that we exploit to great advantage in the rest of this work. 

2.1. The Laplacian and its Moore-Penrose Pseudo-Inverse 

Let G{V,E) be a simple, connected and undirected graph. We denote by n = |V^(G)| the number of 
nodes/vertices in G, also called the order of the graph G, and by ni — \E{G)\ the number of links/edges. 
The adjacency matrix of G{V, E) is defined as A S SR"'*", with elements [.^xy = o-xy = o-yx = [■^]yx = Wij: 
if X ^ y and exy S E[G) is an edge; otherwise. Here, the weight of the edge Wij is a measure of affinity 
between nodes i and j. Clearly, A is real and symmetric. The degree matrix D, is a diagonal matrix where 
\D]xx = dxx = d{x) = X^yGVfG) ^a:y' ^^ ^^^ weighted degree of node x G V{G)] the sum of all edge weights 
(affinities) emanating from x. Also, vol{G) = '^x<av(G) '^(^)' i^ called the volume of the graph G — the sum 
total of affinities between all pairs of vertices in G. The combinatorial Laplacian of the graph is then given 
by: 

L = D-A (1) 

It is easy to see, from the definition in ([ij above, that the Laplacian L is a real, symmetric and doubly- 
centered matrix (each row/column sum is 0). More importantly, L admits an eigen decomposition of the 
form L = $A$' where the columns of $ constitute the set of orthogonal eigen vectors of L and A is a 
diagonal matrix with [A]ii = Xi : 1 < i < n; being the n eigen values of L. It is well established that for 
a connected undirected graph G{V,E), L is positive semi-definite i.e. it has a unique smallest eigen value 
Ai = 0. The rest of the 7i — 1 eigen values are all positive. Thus, L is rank deficient {rankCL) = n — 1 < n) 
and consequently singular. Its inverse, in the usual sense, does not exist. 

However, the Moore-Penrose pseudo-inverse of L, denoted henceforth by L"*" does exist and is unique. 
Following constitute the basic properties of L+ as a unique generalized inverse of L [8| : 

a. LL+L = L b. L+LL+=L+ c. (LL+)' = LL+ d. (L+L)' = L+L (2) 

Like L, L+ is also real, symmetric, doubly centered and positive semi-definite. Moreover, the eigen decom- 
position of L+ is given by L+ = (E>A+<I>', with the same set of orthogonal eigen- vectors as that of L. The set 
of eigen values of L+ , given by the diagonal of the matrix A+ , is composed of A^ = and the reciprocals 
of the positive eigen- values of L. We denote by l^y, the element in the a;*'' row and j/*'* column of L+ (a 
convention followed for all matrices henceforth) . We emphasize that even when the matrix L is sparse (which 
is the case with real world networks), L+ is always a full matrix. In fact, for a connected graph, all the 
elements of L"*" are non-zero. 

A straightforward approach for computing L+ is through the eigen-decomposition of L, followed by an 
inversion of its non-zero eigen values, and finally reassembling the matrix as discussed above. In practice, 
however, mathematical software, such as MATLAB, use singular value decomposition to compute the pseudo- 
inverse of matrices (c.f. pinv in the standard library). This general SVD based method does not exploit 
the special structural properties of L and incurs 0{n^) computational time, n being the number of nodes 
in the graph. An alternative has recently been proposed in [55] specifically for computing L+ for a simple, 
connected, undirected graph. A rank(l) perturbation of the matrix L makes it invertible. L+ can then be 
computed from this perturbed matrix as follows: 

L+ = f L + -J^ - -J (3) 

\ n J n 

where J e 5R"^" is a matrix of all I's. Although the theoretical cost for this method is also 0{n^), in practice 
it works faster for graphs of arbitrary orders and edge densities than the standard pinv method. But the 
proof of this pudding is in computing! So, to put into context the notion, we present a numerical analysis 
over Erdos-Renyi graphs (ER-graphs) of varying orders and edge densities. An ER-graph is a random graph 
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Figure 1: Computational times: Erdos-Renyi graphs of varying orders and densities. Pertlnv: pseudo-inverse computed through 
ranfe(l) perturbation [55| . PInv: pseudo-inverse computed through standard pinv in MATLAB. 



determined by parameters {n,p), where n is the order of the graph and p is the uniform probabihty for 
the occurrence of any arbitrary (undirected) edge in the graph 11[. We use a dedicated machine with a 
quad-core AMD Opteron processor (800 Mhz/core) and 48 GB of primary memory. 

Fig. [U shows a comparison of the two methods for p = {0.3, 0.5} and n = {1000, 2000, ..., 10000}. The 
experiment is repeated 100 times for each parametric combination {n,p). The fact that the method from 
[5a | outperforms pinv is self evident, as is the fact that the computational times for both methods rises with 
increasing values of n. We also observe great consistency (or very little variance) across the different instances 
for a given (n, p), which is not too surprising. What is of interest, however, is that the computational times 
for a given value of n do not vary significantly across p ~ {0.3, 0.5}, for either of the two methods. We 
observe the same for higher values of p (not shown here). This implies that the methods are insensitive to the 
sparsity of the graphs. Moreover, for graphs of (n, p) — (10000, 0.5), the primary memory imprint for both 
methods is over 2.0 GB when run in MATLAB (a little higher, in fact, for the perturbed inversion method). 
Although the exact figures may vary from machine to machine, they provide a rough estimate that suffices 
for the problem at hand. In summary, for dynamically changing graphs, in which small local modifications 
occur every now and then, such methods would incur undue heavy computational costs due to repeated 
re-computation of L+ from scratch. On the other hand, for graphs of higher orders (n > O(IO^)), such 
decomposition/inversion based methods are rendered impractical from the point of view of computational 
time as well as memory requirements, if performed on a single machine. 

In what follows, we show that the computation of the Moore-Penrose pseudo-inverse of the Laplacian 
can be done in a divide-and-conquer fashion. Our method allows efficient incremental updates of L+ for 
dynamically changing graphs, without having to compute L"*" all over again. Moreover, computing L"*" for 
large graphs becomes feasible, in principle, through parallelization of (smaller) independent sub-problems 
over multiple machines, which can then be re-combined at 0{n^) cost per edge across a division (details in 
a subsequent section). But first we need to establish a few more preliminary results to further motivate our 
study. 



2.2. Sub-Matrix Inverses of Ll 

As described in the previous section, the combinatorial Laplacian L of a connected graph G{V,E), is 
singular and thus non-invertible. However, given that its rank is n — 1, any n — I combination of columns 
(or rows) of L constitutes a linearly independent set. Hence, any (n — 1 x n — 1) sub-matrix of L is 
invertiblc. Indeed, the inverses of such (n — 1 x n — 1) sub-matrices are made use of in several graph 
analysis problems: enumerating the spanning trees and spanning forests of the graph [32| . determining the 
random- walk betweenness of the nodes of the graph [40|, to name a few. However, the cost of computing an 
(n — 1 X n — 1) sub-matrix inverse is still 0(n^). To compute all such sub-matrix inverses amounts to a 
time complexity of 0{n'^). In the following, we show how they can be computed efficiently through L+. 





Figure 2: A simple graph G and its EEN. 



Theorem 1. Let L({n}, {n\) be an (n ~ 1 x n ~ 1) sub-matrix of L formed by removing the n 
n*^ column ofL. Theny{x,y) G V{G) x V{G): 
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The result in Theorem [T] above expresses, in scalar form, the general element (a;*'' row, y*'' column) of the 
inverse of the sub-matrix L({rT}, {n}) in terms of the elements of L+, as claimed. As the choice of the n*'' 
row and column is arbitrary, we can see that the result holds in general for any (n— Ixn— 1) sub-matrix 
(permuting the rows and columns of L as per need). The cost of computing L({rl}, {n})^^ for a given vertex 
n is 0(n^). Therefore, all sub-matrix inverses can be computed in 0{n^) time from L+, which itself can 
be computed in 0{n^) time, even if the standard methods are used. This is clearly an order of magnitude 
improvement. Henceforth, we focus entirely on L+. 



2.3. The Effective Resistance Distance and L"^ 

An interesting analogy exists between graphs and resistive electrical circuits [20|, |28|, |33| . Given a simple, 
connected and undirected graph G{V,E), the equivalent electrical network (EEN) of the graph can be 
formed by replacing each edge e^ G E{G), of weight Wy with an electrical resistance w^ = w~-^ ohm (c.f. 
Fig. [2]). A distance function can then be defined between any pair of nodes {x,y) G V{G) x V{G) in the 
resulting EEN as follows: 

Definition 1. Effective Resistance (rtxy): The voltage developed between nodes x and y, when a unit current 
is injected at node x and is withdrawn at node y. 

It is well established that the square root of the effective resistance distance {^/Th^) is a Euclidean metric 
with interesting applications [28„ 33]. Amongst other things, it determines the expected length of random 
commutes between node pairs in the graph: C^y — vol{G) ^xy, [iJ, |52|. More importantly, ^.^y can be 
expressed in terms of the elements of L+ as follows: 



^ '■xy ^xx ^ '•yy ''xy '•yx 



(5) 



We now invert the elegant form in ([5]) to derive an important result in the following lemma which gives us 
the general term of L+ in terms of the distance function J7. 



Lemma 1. V(x,y,z) e V{G) x V(G) x V{G) : 
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The RHS in Lemma [T] above is composed of two terms: a triangle inequality of effective resistances (52| 
and a double summand over all pair wise effective resistances in the EEN. It is easy to see that the double- 
summand simply reduces to a scalar multiple of the trace of L+ (rr(L+) = X]"=i 'i)- Thus the functional 
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Figure 3: The Star Graph: Pre-computed Li for n - 



half that determines the elements of L+, is the triangle inequality of the effective resistances, while the 
double summand contributes an additive constant to all the entries of L"*" . We illustrate the utility of this 
result, with the help of two kinds of graphs on the extremal ends of the connectedness spectrum: the star 
and the cliquelll 



2.3.1. The Star 

A star of order n is a tree with exactly one vertex of degree n — 1, referred to as the root^ and n — \ 
pendant vertices each of degree 1, called leaves, (c.f. Fig. [3]). By definition, a singleton isolated vertex is also 
a degenerate star albeit with no leaves. It is easy to see that S^ being a tree, is the most sparse connected 
graph of order n (with exactly n — 1 edges). Also, Sn is the most compact tree of its order (lowest diameter). 
In the following, we show how Lj can be computed using the result of Lemma [TJ 

Corollary 1. For a star graph 5„ of order n, with node vi as root and nodes {v2, vs, ■■■, Vn} as leaves, hg 
is given by: 



It, 



n — 1 



and 



yx:2<x<n, /+=?+=--! 



Vx : 2 < a; < 71, l^x — 



n — n ~ \ 



and yx y^ y : 2 < x,y < n, ity — ^t 



n + 1 



(7) 
(8) 



2.3.2. The Clique 

On the other end of the connectedness spectrum lies the clique. A clique Kn of order n is a complete 
graph with "^"~ ' edges. Clearly, the clique is the densest possible graph of order n, as there is a direct edge 
between any pair of vertices in it. It is also the most compact graph of its order (lowest diameter). Then, 



Corollary 2. For a clique Kn of order n, Lj is given by: 

n-1 



Vx : 1 < X < n, /i, = „ 



yx^y:l<x,y<n, l^y = ly^ = j (9) 



The results in the corollaries presented above are not just illustrative examples. They are also of interest 
from a computational point of view, particularly when the graph under study is an unweighted one. Both 
stars and cliques can occur as motif sub-graphs in any given graph. Indeed, for any non-trivial connected 
simple graph of order n > 3, there is at least one sub-graph that is a star. Selecting any vertex i with 
d{i) > 2, and conducting a one-hop breadth first search, generates a star sub-graph. Cliques, though not so 



^The graphs in these examples are assumed to be unweighted, i.e. all edges have a unit resistance/conductance. 
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Figure 4: Divide-and-Conquer: Connected bi-partition of a graph and the two-stage process: first join followed by three edge 
firings. The dotted lines represent the edges that are not part of the intermediate graph at that stage. 



universal, also occur in real world networks (e.g. citation networks). Therefore, in any divide-and-conquer 
methodology, both stars and cliques are likely to be found at some stage. We have already established that 
the cost of computing Lj and L^ is 0(1) (as they are determined entirely by n) and hence the solution 
to such a sub-problem, when found, is obtained at the lowest possible cost — a true practical gain. 

To conclude, we have demonstrated that there exists a relationship between the elements of L+ and 
the pairwise effective resistances in the graph G{V,E), that yields interesting closed form solutions for the 
pseudo-inverse for special graphs such as stars and cliques. In the subsequent sections, we demonstrate that 
it can be used to compute L+ for general graphs as well, incrementally, in a divide-and-conquer fashion. 



3. Prom Two to One: Computing L+ by Partitions 

In this section, we present our main result — the computation of the Moore-Penrose pseudo-inverse of 
the Laplacian, or L+, by means of graph bi-partitions. In ii3.ll we lay out a two-stage process — the first join 
followed by edge firings — that underpins our methodology. We then provide specific closed form solutions 



3.1. Connected Bi-Partitions of a Graph and the Two- Stage Process 

In order to compute the Moore-Penrose pseudo- inverse of the Laplacian of a simple, connected, undirected 
and unweighted graph G{V,E) by parts, we must first establish that the problem can be decomposed into 
two, or more, sub-problems that can be solved independently. The solutions to the independent sub-problems 
can then be combined to obtain the overall result. But before we proceed to do so, a few notations are in 
order. 

Definition 2. Connected Bi-partition {P — {Gi,G2)): A cut of the graph G which contains exactly two 
mutually exclusive and exhaustive connected sub-graphs Gi and G2. 

Fig. HKa-b), shows a graph G{V,E) and a connected bi-partition P{Gi,G2) of it, obtained from the 
graph G{V,E) by removing the set of dotted edges shown. Each partition P{Gi,G2) has certain defin- 
ing characteristics in terms of the set of vertices as well the set of edges in the graph. Let, Vi{Gi) 
and V2{G2) be the mutually exclusive and exhaustive subsets of V{G) i.e. Vi{Gi) n V2(G'2) — <f> and 
Vi{Gi)LiV2{G2) — V{G). Similarly, let Ei{Gi) and £2(^2) be the sets of edges in the respective sub-graphs 



Gi and G2 of P and E{Gi,G2), the set of edges that violate the partition P{Gi,G2) i-e. have one end in 
Gi and the other in G2. Thus, Ei{Gi) D ^2(02) = Ei{Gi) n £;i(Gi,G2) = E{Gi,G2) n £:2(G2) = (/) and 
£;i(Gi) U £;(Gi,G2) U £^2(G2) = -E(G). We denote by ViG), the set of all such connected bi-partitions of 
the graph G(l/,£;). 

It is easy to see that for an arbitrary connected bi-partition P(Gi,G2) G 'P(G) both Gi and G2 are 
themselves simple, connected, undirected and unweighted graphs. Hence, the discussion in ^is applicable 
in its entirety to the sub-graphs Gi and G2 independently. Note then that Lg and L^ , the Moore-Penrose 
pseudo-inverse of the Laplacians of the sub-graphs Gi and G2, must, by definition, exist. The pair {L^ , 
Lg }, constitutes the solution to two independent sub-problems represented in the set {Gi,G2}. All that 
remains to be shown now is that {L^ , Ljt } can indeed be combined to obtain Lg. It is this aspect of the 
methodology, that we call the two-stage process, as explained in detail below. 

The original graph G(V, E) can be thought of, in some sense, as a bringing together of the disjoint 
spanning sub-graphs Gi and G2, by means of introducing the edges of the set E[Gi,G2)- Starting from 
Gi and G2, we iterate over the set of edges in E{Gi,G2) in the following fashion (c.f. Fig. |4]for a visual 
reference). Let Cij S E{Gi,G2) ■ i e Vi(Gi),j G V2(G2), of weight Wij and resistance ojij = w^-^ ohm, 
be an arbitrary edge chosen during the first iteration as shown in Fig. \Mc)- We call this step the first 
join in our two-stage process, whereafter Gi and G2 come together to give an intermediate connected 
spanning sub-graph (say G3(Vi3, -Es)). The first join represents a point of singularity in the reconstruction 
process, particularly from the perspective of the effective resistance distance. Note that before the first 
join, the effective resistance distance between an arbitrary pair of nodes (x, j/) e V{G) x V{G) is infinity, if 
X e Vi(Gi) and y G V2(G2), as there is no path connecting x and y. However, once the first edge eij has 
been introduced during the first join, this discrepancy no longer exists and all pairwise effective resistances 
are finite. Precisely, if Vt^^^ : Vi{Gi) x Vi{Gi) -^ 5R+ and n^^ : V2{G2) x F2(G2) ^ 5R+, be the pairwise 
effective resistances defined over the sub-graphs Gi and G2, the following holds: 

n^^ = ^^^ tfx,yeVi{Gi) 

= ^'^1, ifx,yeV2{G2) 

= n^l+Lu,j + n%\ tfxeVAGi) & yeV2{G2) 

Needless to say, this is a critical step in the process as we need finite values of effective resistances in order to 
exploit the result in Lemma[T] Hereafter, we can combine the solutions to the independent sub-problems, i.e. 
LiQ and Lq , to obtain L^ . Indeed, we obtain an elegant scalar form with interesting properties (details 
in subsequent sections). 

Following the first join, the remaining edges in E(Gi,G2), can now be introduced one at a time to 
obtain a sequence of intermediate graphs (G4 — ^ G5 — >■ Gg) which finally ends in G{V, E) (c.f. Fig. Hljd-f)). 
We call this second stage of edge introductions, following the first join, edge firing. In terms of effective 
resistances, each edge firing simply creates parallel resistive connections, or alternative paths, in the graph. 
Algebraically, each edge firing is a rank{l) perturbation of the Laplacian for the intermediate graph from 
the previous step. Thus, the Moore-Penrose pseudo-inverse of the Laplacians for the intermediate graph 
sequence (G4 — > G5 ^- Gg) can be obtained starting from L^ using standard perturbation methods [36| 
(details in subsequent sections). 

To summarize, therefore, during the two-stage process we obtain a sequence of connected spanning sub- 
graphs of G{V,E) starting from a partition P(Gi,G2) G ^(G), performing the first join by arbitrarily 
selecting an edge e.y G E{Gi, G2), and then firing the remaining edges, one after the other, in any arbitrary 
order. The number of connected spanning sub-graphs of G{V, E) constructed during the two-stage process 
is exactly |_E(Gi,G2)| (= 4 for the example in Fig. |4]). Note that, the order in which these sub-graphs are 
generated, is of no consequence whatsoever. Next, we use these insights to obtain L+ for the intermediate 
graphs in the sequence. 
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Figure 5: The First Join: Scalar mapping (Lq , LJ j to Lq . The grey blocs represent relevant elements in Lg , Lg and 
Lq . Arrows span the elements of the upper triangular of Lg that contribute to the respective diagonal element pointed to 
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3.2. The Two-Stage Process and L"*" 

We now present the closed form solutions for the Moore-Penrose pseudo-inverse of the Laplacians of the 
set of intermediate graphs obtained during the two-stage process. 



3.2.1. The First Join 

Given, two simple, connected, undirected graphs Gi{Vi,Ei) and G2(V2,i?2) let L^t and L^ , be the 
respective Moore-Penrose pseudo-inverses of their Laplacians. Also, let ni = |yi(G'i)| and ri2 — |V2(G'2)| be 
the orders of the two graphs. We denote by Ixy and Ixy respectively the general terms of the matrices L^ 
and L(t . Next, let the first join between Gi and G2 be performed by introducing an edge Cij between the 
graphs Gi and G2 to obtain GsiVs, E3); where i e Vi{Gi) and j G 1^2(02). Clearly, ¥3(0^) = Vi(Gi)uy2(G2) 
and EsiGs) = Ei{Gi) U {e^} U -E2(G2). Thus, |T^3(G3)| = ns = ni + n2 and EjiGs) = 1713 ^ m,i + 1 + m2. 
By convention, the vertices in V3(G3) are labeled in the following order: the first ni vertices {1, 2, ..., rii} are 
retained, as is, from Vi{Gi) and the remaining 77.2 vertices are labelled {ni -I- l,ni -I- 2, ...,ni + 712} in order 
from V2(G2). We denote by L^ the pseudo- inverse and Ixy its general term. Then, 

Theorem 2. V(x,y) G ValGs) x VsiGs), 
,.(3) _ ,.(t, r^2n3{l:^^^+l^^^)-n^{lp^^+lr+^^^ 



l + {3) = ;+(!) 



7 + (2) 
'■xy 



n.n3{i:j^+lt^^)~r^{lp^^+ir+' 



( , + (1) , , + (2)\ (, + {l) , , + (2) , \ 

'^3 [nil J + n2lj^ 'j - nin2 [1,^' + lj> + ojij j 



ifx,yeViiGi) 
ifx,yeV2{G2) 
ifxeViiGi) & yeV2iG2) 



The result in Theorem [2] is interesting for several reasons. First and foremost, it clearly shows that the 
general term of L^ , is a linear combination of the elements of L^ and L^^ . This was indeed our principal 

claim. Secondly, V(x, y) € V3(G3) x V3(G3), each individual Ixy can be computed independent of the others 

_I_/Q\ -1-^3^ 

(barring symmetry, i.e. Ixy = lyx , which we shall discuss shortly). They are determined entirely by the 
specific elements from the i*'' and j*'' columns of the matrices L^ and L^ , depending upon the membership 

of X and y in the disjoint graphs. This implies that all Ixy can be computed in parallel, as long as we have 
the relevant elements of Li and Li . 



From a cost point of view, the first join requires 0(1) computations per element in L^ — constant 
number of {+, — , x, /} operations — if {L^ , Lg } is given a priori. The common term in the numerator, 

i.e. (Zjj + L + 1), is an invariant for the elements of Lg and need only be computed once. This term 
is simply a linear multiple of the change in trace: 

A{Tr) = Tr(L+J - [Tr(L+ J + Tr(L+ J] (10) 



For details see the proof of Lemma [5] in Appendix. Therefore, we achieve an overall cost of 0(ri|) for the 
first join. Last but not the least, we need to compute and store only the upper triangular of L^ . Owing to 
the symmetry of Lg , the lower triangular is determined automatically. As for the diagonal elements, they 
come without any additional cost as a result of L^t, being doubly-centered (c.f. Fig. [SJ. 

3.2.2. Firing an Edge 

We now look at the second stage that of firing an edge in a connected graph. Given a simple, connected, 
undirected graph Gi{Vi,Ei), let eij ^ Ei{Gi) be fired to obtain G2{V2,E2). Clearly, V"2(G2) = Vi{Gi) and 
-^2(02) = Ei{Gi) U {cij}. Continuing with our convention, we denote by L|t and L^ the Moore-Penrose 
pseudo-inverses of the respective Laplacians. Then, 

Theorem 3. \f{x,y) e V2{G2) x V2{G2), 

.+(2) _ ,+(1) _ V" '"J ) V^y 'ly J (. . ^ 

''xy -'■XV , nCi ^ ' 

UJij + \L^j 

where il^,^ is the effective resistance distance between nodes i and j in the graph 6*1(^1, Ei) — an invariant 
V(x, J/) e V3(G'3) X V3(G'3) that is determined entirely by the end-points of the edge e^ being fired. Once 
again, we observe that the general term of L^ is a linear combination of the elements of Ljt and requires 
0(1) computations per element in L^ — constant number of {-I-, — , x , /} operations — if L|t is given a 
priori. The rest of the discussion from the preceding sub-section on first join — element- wise independence 
and upper triangular sufficiency — holds as is for this stage too. However, before concluding this section, 
we extend the result of Theorem |3] to the pairwise effective resistances themselves in the following corollary. 

Corollary 3. V(a;, y) G ^^2(62) x ^2(02), 



q2^, = fiGi 



Gi _ ci^i\ _ ( n'^i _ o^i' 






xy xy 4/, . : nG 



A{uj,j + Q'r 



(12) 



The result above is interesting in its own right. Note that computing fl'~^^ when the edge density of a graph 
increases (or the expected commute times in random walks between nodes) , is pertinent to many application 
scenarios 12, llJ, l27|, l29l l28|, ISSl, |47|, |48| . Corollary |3] gives us a way of computing these distances directly 
without having to compute Ljt first. 

To conclude, therefore, we have established that the Moore-Penrose pseudo-inverses of the Laplacians of 
all the intermediate graphs, generated during the two-stage process, are incrementally computable from the 
solutions at the preceding stage, on an element-to-element basis. We shall return to specific applications of 
these results to dynamic (time-evolving) graphs and large graphs in general, in a subsequent section. But 
first, for the sake of completeness, we present the case of structural regress. 
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4. From One to Two: A Case of Regress 

We now present analogous results in the opposite direction, that of structural regress of a graph through 
successive deletion of edges until the graph breaks into two. These results, similar in essence to those 
presented in the preceding section, are particularly significant with respect to dynamically evolving graphs 
that change with time (e.g. social networks). Once again, we have two cases to address with respect to edge 
deletions viz. (a) Non-bridge edge: an edge that upon deletion does not affect the connectedness of the graph 
(c.f m.l\\ : and (b) Bridge-edge: an edge that, when deleted, yields a connected bi-partition of the graph 
(c.f 



4.I. Deleting a Non-Bridge Edge 

Given a simple, connected, undirected graph Gi(Vi,£'i), let e^ G Ei{Gi) be a non-bridge edge that is 
deleted to obtain 62(^2, £^2). Clearly, ^2(02) = Vi{Gi) and £'2(02) = Eild) - {e^j}. Once again, we 
denote by Lg and L^ the Moore-Penrose pseudo-inverses of the respective Laplacians. Then, 

Theorem 4. y{x,y) G 1^2(62) x ^2(62), 

'a"-c'+ ^" ~" '\z.'" ' (13) 

Note, as e^ is a non-bridge edge, fl^J^ ^ 1. In fact, given that Gi{yi,Ei) is connected, undirected and 
unweighted, we have: < flf,^ < 1. Also, as in the case of the two-stage process, we observe the same 
element-wise independence for L^ here as well. Once again, if the quantity of interest is ft^^ or pairwise 
expected commute times in random walks, we can simply use the following corollary. 



Corollary 4. V(a;,y) e ^2(62) x ^2(6*2), 



nt^ = n 



n?} ^n?A - i^f; -^%' 



xy xy I At, .. . oGl 



4(^.,-r!^0 



(14) 



'''■xy 


- n'^^ 

— ^''xy! 




— "xi/J 



4-. 2. Deleting a Bridge Edge 

Finally, we deal with the case when a bridge edge is deleted from a graph, thus rendering it disconnected 
for the first time. This represents the point of singularity in the case of structural regress (analogous to the 
first join). Continuing with our convention, let Gi(yi,£'i) be a simple, connected, undirected graph with a 
bridge edge Cij G Ei{Gi). Upon deleting dj, we obtain G'2(V2,£'2) and 03(1/3,-^3), two disjoint spanning 
sub-graphs of Gi. The orders of Gi, G2 and G3 are respectively given by ni, 712 and n^, while L|t , Lj and 
Lq are the respective pseudo-inverse matrices of their Laplacians. It is easy to see that: 

ifx,yeV2{G2) 
ifx,yeV3iG3) 

and 51^^^*^=* = V,^^^'-^^ = 00, as Gi and G2 are disjoint. To obtain L^^ and L^^ from L^^, we use the result 
in Lemma [T] 

Theorem 5. V(a;, y) G ^2(02) x ^2(02) and\f{u,v) G ^3(03) x VaiGa), 

;+(2) _ 7-Ki) ^eV2(G2) :teV2(G2)ygV2(G2) , , 

"2 
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"3 "^^ 



/ J \ UW ' WV J / J / J uv 

7 + (3) _ 7+(l) _ ^ ueVsiGs) veVsjGa) , , 



'3 



Note also that L^ e 3fi"2xn2 ^-^j l+ g sffnsxna^ Ppj. convenience, and without loss of generality, we assume 
that the rows and columns of L^ G sjfjnixni have been pre-arranged in such a way that the first (n2 x 712) 
sub-matrix (upper-left) maps to the sub-graph G2 and similarly the lower-right {n^ x 713) sub-matrix to G3. 

5. Bringing it together: Algorithm, Complexity and Parallelization 

In this section, we bring together the results obtained in f|3]and 21 to bear on two important scenarios: 
(a) dynamic ( time- evolving) graphs (c.f. §5.ip . and (b) real- world networks of large orders (c.f. §5.2p . In 
each case, we discuss the time complexity and parallelizability of our approach in detail. 

5.1. Dynamic Graphs: I'ncreme'ntal Computation for Incremental Change 

Dynamic graphs are often used to represent temporally changing systems. The most intuitively accessible 
example of such a system is an online social network (OSN). An OSN evolve not only in terms of order, 
through introduction and attrition of users with time, but also in terms of the social ties (or relationships) 
between the users as new associations are formed, and older ones may fade off. Mathematically, we model 
an OSN as a dynamic graph Gr{Vr,ET-) where the sub- index t denotes the time parameter. We now study 
a widely used model for dynamic, temporally evolving, graphs called preferential attachment [31. M. l23l|. 

The preferential attachment model is a parametric model for network growth determined by parameters 
(n, k) such that n is the desired order of the network and k is the desired average degree per node. In its 
simplest form, the model proceeds in discrete time steps whereby at each time instant 1<T-|-I<n, a new 
node Vt+1 is introduced in the network with k edges. This incoming node Wr+i, gets attached to a node 
Vi : 1 < i < T , through exactly one of its k edges, with the following probability: 

^.+ifa)= ^/^^!\, (17) 

where dr{i) is the degree of node Vi at time r. The end-points of all the edges emanating from Vt+i are 
selected in a similar fashion. At the end of time step t + 1, we obtain Gt+i{Vt+i, Er+i), and the process 
continues until we have a graph Gn{V„, E„) of order n|^ 

Simplistic though it may seem, this model has been shown to account for several characteristics observed 
in real-world networks, including the power law degree distributions, the small-world characteristics and 
the logarithmic growth of network diameter with time [3|, [Zl, |23| . We return to these in detail in the next 
sub-section while dealing with the more general case. 

It is easy to see that in order to study the structural evolution of dynamic networks, particularly in 
terms of the sub-structures like spanning trees and forests ^] , or centralities of nodes and edges [40|, |46| ; 
or voltage distributions in growing conducting networks [5l|, we require not only the final state GniVnyEn), 
but all the intermediate states of the network. In other words, we need to compute the pseudo-inverses of 
the Laplacians for all the graphs in the sequence (Gi — ?► G2 — ?► ... — >■ G„). Clearly, if the standard methods 
are used, the cost at time step r is 0{t^). The overall asymptotic cost for the entire sequence is then 
n{n + 1) 



« E 



t3 



\.T = 1 




^In practice, for k > 1, the process starts with a small connected network as a base substrate to facilitate probabilistic 
selection of neighbors for an incoming node. For /t = 1, we may start with a singleton node, and the resulting structure is a 
tree. 
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(a) r = 25 




(b)r 



Figure 6: Growing a tree by preferential attachment (n = 100, k = 1). The node v.^, being added to the tree at time step t, 
is emphasized (larger circle). Dotted edges at time steps r = {25, 50} are a visual aid representing edges that are yet to be 
added in the tree until the order-limit (n = 100) is attained. 



In contrast, using our incremental approach, we can accomplish this at a much lower computational cost. 
Note that in the case of growing networks, we do not need an explicit divide operation at all. The two 
sub-problems at time step t-\-1 are given a priori. We have, Gr{Vr,Er) and a singleton vertex graph {ti^_|_i} 
as a pair of disjoint sub-graphs. The n edges emanating from {vr+i\ have end-points in Gr as determined by 
(J17p . The conquer operation is then performed through a first join between the singleton node {wr-i-i} and the 
graph Gr{VrTEr). We can assume that L^ is already known at this time step (the induction hypothesis). 
Also, Lt 1 = [0] and 712 = 1 during the first join. Substituting in Theorem[2]we obtain the desired results. 
The rest of the n— \ edges are accounted for by edge firings (c.f. the discussion in SJ3]). Therefore, we need 

, ^/ 9^ • • , , ^ / V^ 2 n(n + l){2n + l)\ 

only U[n ■ r ) computations at time step r, and hence C ' - ^ ~ — .- 



T = l 
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overall. As 



K << n in most practical cases, we have an order of magnitude lower average cost than that incurred by 
the standard methods. Further improvements follow from the parallelizability of our approach. Although 
we have not discussed it explicitly, it is evident that node and edge deletions can all be handled within this 
framework in the same way and at the same 0{rP') cost per operation (c.f. the discussion in Q. 



5.2. Large Real-World Networks: A Divide- And- Conquer Approach 

In order to compute L+ for an arbitrary graph G{V,E), in a divide-and-conquer fashion, we need to 
first determine independent sub-graphs of G in an efficient manner. Theoretically, an optimal divide step 
entails determining a balanced connected bi-partition P(Gi, G2) of the graph G such that |V^(Gi)| « 1^^(62)! 
and |_B(Gi,G2)|, the number of edges violating the partition, is minimized. Such balanced bi-partitioning 
of the graph, if feasible, can then be repeated recursively until we obtain sub-graphs of relatively small 
orders. The solutions to these sub-problems can then be computed and the recursion unwinds to yield 
the final result, using our two-stage methodology in the respective conquer steps. Alas, computing such 
balanced bi-partitions, along with the condition of minimality of |i?(Gi,G2)|, belongs to the class of NP- 
Complete problems 49], and hence a polynomial time solution does not exist. We therefore need an efficient 
alternative to accomplish the task at hand. Partitioning of graphs to realize certain objectives has been 
studied extensively in diverse domains such as VLSI CAD [5|, parallel computing, artificial intelligence and 
image processing [50|, and power systems modeling 49|,|53|. Perhaps, the most celebrated results in this class 
of problems are the spectral method J34| and the max-flow = min-cut [26| , both of which are computable 
in polynomial time 3J, |21[ . Approximation algorithms for the balanced connected bi-partitions problem, 
for some special cases, have also been proposed [,17,, ,6]. Although useful in specific instances, such methods 
when used for the divide step may, in themselves, incur high computational costs thus undermining the gains 
of the conquer step. We need a simple methodology that works well on real-world networks. 



13 



G{V, E) 


n^\V{G)\ 


m^\E{G)\ 


Leaves 


Cut-off 


# Comp. 


\V{GCC)\ 


\E{GGG)\ 


# Cut-Edges 


Epmions 


75,888 


405,740 


35,763 


4,429 


30,376 


37,924 


61,482 


102,452 


SlashDot 


82,168 


504, 230 


28,499 


7,012 


36,311 


41,084 


62,225 


164,719 



Table 1: Basic properties: Epinions and SlashDot networks. 
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(b) SlashDot Degree dist. 
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(c) Epinions: Components at cut-ofF (d) SlashDot: Components at cut-off 

Figure 7: Structural regress: Epinions and SlashDot Networks. 



Real- world networks, and particularly online social networks, have been shown to have several interesting 
structural properties: edge sparsity, power-law scale-free degree distributions, existence of the so called 
rich club connectivity (SJ, |7|, |23 1 , small-world characteristics [54| with relatively small diameters {0{log n)). 
Collectively, these properties amount to a simple fact: the overall connectivity between arbitrary node pairs 
is dependent on higher degree nodes in the network. Based on these insights, we now study two real-world 
online social networks — the Epinions and SlashDot networks [l| — to attain our objective of a quick and 
easy divide step. Table [1] gives some of the basic statistics about the two networks|f| It is easy to see that 
the networks are sparse as m = 0{n) << 0{n^) in both cases. Moreover, note that a significant fraction 
of nodes in the graphs are leaf/pendant nodes, i.e. nodes of degree 1 (~ 47% for Epinions and « 34% for 
SlashDot). From Fig. [7](a-b), it is also evident that the node degrees indeed follow a heavy tail distribution 
in both cases. Thus, there are many nodes of very small degree (e.g. leaves) and relatively fewer nodes of 
very high degrees in these networks. Therefore, in order to break the graph into smaller sub-graphs, we 
adopt an incremental regress methodology of deleting high degree nodes. Ordering the nodes in decreasing 
order by degree, we remove them one at a time. This process divides the set of nodes into three parts at 
each stage: 

a. The Rich Club: High degree nodes that have been deleted until that stage. 

b. The Giant Connected Component (GCC): The largest connected component at that stage. 

c. Others: All nodes that are neither in the rich club nor the GCC. 



^Although the networks originally have uni-directional and bi-directional links, we symmetrize the uni-directional edges to 
make the graphs undirected. 
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We repeat the regress, one node at a time, until the size of the GCC is less than half the size of the original 
graph. We call this the cut-off point. We then retain the GCC as one of our sub-graphs (one independent 
sub-problem) and re-combine all the non-GCC nodes together with the rich club to obtain (possibly) multiple 
sub-graphs (other sub-problems). This concludes the divide step. 

Table[T]shows the relevant statistics at the cut-off point for the two networks. Note that the cut-off point 
is attained at the expense of a relatively small number of high degree nodes (w 5% for Epinions and ~ 8% 
for SlashDot). Moreover, the number of nodes in the CCC is indeed roughly half of the overall order, albeit 
the GCC is surely sparser in terms of edge density than the overall network {\E{GCC)\/\V {GCC)\ = 1.63 
vs. \E{G)\/\V{C)\ = 5.35 for Eptmons and \E{GCC)\/\V{CCC)\ = 1.51 vs. \EiG)\/\V{G)\ = 6.13 for 
Slashdot). Fig. [7](c-d) shows the sizes (in terms of nodes) of all the connected components for the respective 
graphs at the cut-off point. It is easy to see that other than the CCC, the remaining components are of 
negligibly small orders. Recombining the non-GCC components together (including the rich club) yields an 
interesting result. For the Epinions network, we obtain two sub-graphs of orders 37, 933 and 31 respectively 
while for the Slashdot network we obtain exactly one sub-graph of order 41, 084. This clearly demonstrates 
that our simple divide method, yields a roughly equal partitioning of the network — and thus comparable 
sub-problems — in terms of nodes. The pseudo-inverses of these sub-problems can now be computed in 
parallel. Albeit, as in the case of all tradeoffs, this equitable split comes at a price of roughly k = 0{n) edges 
that violate the cut (c.f. Table [T]). This yields an 0{n^) average cost for the two-stage process (c.f. [g]). 
However, given the element-wise parallelizability of our method, we obtain the pseudo-inverses in acceptable 
times of roughly 15 minutes for the Epinions and 18 minutes for the SlashDot networks. 

6. Related Work 

The applications of the Moore-Penrose pseudo-inverse and the sub-matrix inverses of the Laplacian for 
a graph are multifarious. We discuss a few instances here in summary. As alluded to earlier, L+ is used 



to compute effective resistance distances between the nodes of a graph [33[ as well as the one way hitting 



and commute times in random walks between node pairs in a graph. All these distances serve as measures 



of multi-hop dissimilarity between nodes and find applications in several graph mining contexts |14l . |28[. 
Moreover, for every connected undirected graph there is an analogous reversible Markov chain, L+ finds use 
in the computation of relevant metrics (such as hitting time, cover time and mixing rates). 

L+ is a gram-matrix. Its eigen decomposition yields an n — dimensional Euclidean embedding of the 
graph whereby each node in the graph is represented as a point in that space. The general term l^ represents 
the inner product of the respective position vectors for the nodes x and y and thus L^ is a valid kernel for 
a graph. This geometric interpretation has been used in collaborative recommendation systems 28[ |27J, |29j . 



In 15|, ll6| the elements of L+ have been given an elegant topological interpretation in terms of the dense 
spanning rooted forests of the graph. Namely, the general term Z+ represents the number of spanning rooted 
forests of the graph with exactly two trees in which the pair (x, y) is in the same tree, rooted at x (or y 
by symmetry) . Combining the geometric interpretation from [28l |271 129| and the topological interpretation 



from [15|, Il6|. the diagonal elements of L+ have been used as centrality indices for the nodes of a complex 
network in [46| . This centrality index, called topological centrality, reflects both the overall position as well 
as the overall connectedness of nodes in the network. Consequently, it is a measure of the robustness of node 
to random multiple failures in the network. By extension, rr(L+), also called the Kirchhoff index of a graph 



33|,l55|, is a global structural descriptor for the graph on a whole. This index is quite popular in the field of 



mathematical chemistry and is used to measure overall molecular strength [42, |43|, |4J, |45| . 

The elements of sub-matrix inverses have analogous interpretations in terms of unrooted spanning forests 
of the graph [32| . In [40| , the sub-matrix inverses have been used to compute the random- walk betweenness 
centrality, another useful index to characterize roles of nodes in a network. 
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Table 2: Summary of results: Atomic operations of the divide-and-conqucr methodology. 



7. Conclusion and Future Work 

In this work, we presented a divide-and-conquer based approach for computing the Moore-Penrose pseudo- 
inverse of the Laplacian (L+) for a simple, connected, undirected graph. Our method relies on an elegant 
interplay between the elements of L+ and the pairwise effective resistance distances in the graph. Exploiting 
this relationship, we derived closed form solutions that enable us to compute L+ in an incremental fashion. 
We also extended these results to analogous cases for structural regress. Using dynamic networks and 
online social networks as examples, we demonstrated the efficacy of our method for computing the pseudo- 
inverse relatively faster than the standard methods. The insights from our work open up several interesting 
questions for future research. First and foremost, similar explorations can be done for the case of directed 
graphs (asymmetric relationships), where analogous distance functions — such as the expected commute 
time in random walks — are defined, albeit the Laplacians (more than one kind in literature) are no longer 
symmetric IQ]. Secondly, matrix-distance interplays of the kind exploited in this work, also exist for a general 
case of the so called forest matrix and its distance counterpart the forest distance [2, Il5l | , both for undirected 
and directed graphs, he results presented here should find natural extensions to the forest matrix and the 
forest metric, at least for the undirected case. Finally, our closed forms can be used in conjunction with 
several interesting approaches for sparse inverse computations |13|, to further expedite the pseudo-inverse 
computation for large generalized graphs. All these motivate ample scope for future work. 
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9. Appendix 

9.1. Proof of Theorem]^ 

Given, L € 3?"''", we note that L • 1 = and 1' • L = 0', where 1 e W and e 3?" are vectors of length 
n containing all I's and O's respectively. From [lO[, we have: 



L({n},{n})-i = [!„_!, -v]L^ 



Iri-r 

u' 



(18) 



16 



where I„_i is the identity matrix of dimension (n — 1 x 71 — 1) and u = v = lG3fi" ^ are vectors of all I's 
of length 71—1. Expanding we obtain the following scalar form: 

[L(M, {n}r%y = Ity - Itn - Ity + ^L (19) 

D 

9.2. Proof of Lemma{^ 

Given, L+ is symmetric and doubly centered and Q^y — ^tx + ^yy ^ ^ty ~ ^yxj ^^ have: 

n n 

/ J ^'■xz + ' 'zy ^ ' 'ay = / \ \'-xx ' ''zz ^ ^''xz) ' V'zz ' '■yy ^ ^'- zy) ^ \''xx ' '■yy ^ ^'■xyJl 

Z=l 2 = 1 

n 

~ ^ / A zz ' '■xy\ 
z = \ 

= 2n Z+^ + 2 rr(L+) 
Rearranging terms, we get: 



^i = ^(Ef^- + ^-y-"-y|-r^KL+) (20) 



M X^ f^.z + ^zy -^xy\-\ Tr(L+) 
^ n n 

Substituting r7'(L^) = -— > > ^xy^ in the expression above, we obtain the proof. 

x=\ y=l 

a 

9.3. Proof of Corollary \^ 

Given, a star Sp with node 1 as root and nodes {2, 3, ...,p} as leaves, we have: 

Vx : 2 < a; < p, Vt^l = I mid ^x ^ y: 2 < x,y < p, ^% = 2 (21) 

Therefore, 

EE^f?-2(p-i)^ 

X— 1 y— 1 

Also, 



p 

I'S'p I rtS„ nS_ 



Y,^ll+n%-^l;^2{p^2) and Vx ^ y : 2 < x, y < p, ^ fifj + f^f^- - f^fg - 2(p - 3) 

z=l z=l 

Substituting for these values in Lemma [TJ and noting that Lg is doubly-centered, we obtain the proof. 

D 

9.4. Proof of Corollary \^ 

2 
Given a clique Kp of order p, Vx 7^ y : 1 < x, 7/ < p, fl^y — — [91. Therefore, 

P 

J2J:^^y'=2ip-l) 

x—1 y—1 
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Also, 

4(p-l) 



Vx:l<x<p, ^f]^''+f]^--f]^- 



z=l 



» — 1 
Substituting in Lemma (TJ we obtain: Z+^ = — ^ and, from the fact that LJ^ is doubly-centered, 

Z+ 1 



XX 



yx ^y.l <x,y <p, 1+ - - 

D 

P. 5. Proo/ of Theorem\M 

We present the proofs for the following two cases: (a) x,y & Vi{Gi) and, (b) x S Vi{Gi) and y G ^2(6*2). 
It is obvious that the other two cases, viz. (c) x,y G ^2(02) and (d) x e V2(G2) and y £ Vi{Gi), follow from 
symmetry. But first we must express Tr(Lg ) as a function of {Tr^LQ ),Tr{IjQ )), which is useful to us in 
both cases. 

Lemma 2. For two disjoint simple, connected, undirected graphs Gi{Vi,Ei) and G2{V2, E2), /et G3(V3, £'3) 
be the graph resulting from the first join between Gi and G2 by means of introducing an edge Cij : i € 
ViiGi),jeV2{G2). Then, 

Tr(L+J = r.(L+J +Tr(L+J + -^ (^^^^ + /+(^' +c.,) (22) 

9.5.1. Proof of Lemma\^ 

For an arbitrary node x e Vi(Gi): 

^Sj = f^?J, ifyeVi{Gi) 

Therefore, 



r!^/+c^y- + f^f;, ify(^V2{G2) 



= n, l+P + rr(L+J + n2 (z+i^) + Z+(^' - 2 C/'^) + ^2 u;,, + ^2 /+/'' + Tr(L+J 
Summing up over all nodes x e V^i(Gi): 

^ J2 "?,^ = (2ni+n2)Tr(L+J+nin2(4''^+;+/'^+^y-) + mTKL5j 

a:eyi(Gi)yeV3(G3) 

By symmetry, 

J2 E "?^ = (2n2 + ni)Tr(L+J+nin2(4<'^+;+/'^+^y)+^2Tr(L+J 

a;GV2(G2)yGV3(G3) 

Therefore, 

E E ^.^^ - E E ".%^+ E E ^y 

xeVaiGa) yeVaiGs) xeVi{Gi) yeVaiGa) xeV2{G2) yeVaiGs) 



2 (m + n2) (rr(L+J + Tr{L+J) + 2 nms (^^'^ + ^//'^ + ^y) 



Now, substituting Tr(Lg ) = - — \^ \^ ^ct j ^^ obtain the result. 

'^^ 2:ey3(G3) 2/6^3(03) 
D 
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9.5.2. Rest of the Proof of Theorem[M 
Case a: x,y £ Vi{Gi) 



From Lemma [U we have: 



\z&V3(G3) J 



(23) 



For the triangle inequahty in the RHS above: 

zeVsiGs) zGViiGi) z£V2{G2) 

= (m + n^) Iti'^ + Tr(L+ J + n^ /+'^^ - 2^2 tj''^ + n^ lo,, + n^ l^^ + Tr(L+J 
By symmetry, 

E f^?,=^ = ("1 + «2) ;+J^) + rr(L+J + n^ 4(1) - 2n2 l^^^^ + n2 c.,, + n^ Z+/'^ + Tr(L+J 

FinaUy, for the last of the three terms: 



E n^4 = (ni + n2) (d^^ + ^+i^^ - 2 /+i^' 



Z^V3{G3) 

Summing the three individual terms along with the value of Tr(L^ ) from (j22p and substituting the result 
in (P5)) . we obtain the proof. 



Case b: a; e V"i(Gi) and y G T^2(G2) 
Once again, 



V-^ey3(G3) / 



e^ = i E ^S+^f.^-^.^^ -;^r.(L+3) (24) 



For the triangle inequality in the RHS above: 



E ^^^ = E ^^^+ E {^1+-. + ^ 



G2 

zeVsiGa) zeViiGi) zeV2{G2) 

= in, + n2) itP + TriL+J + n^ 4''^ - 2^2 /+/'^ + n, u,, + n^ ijf^ + Tr(L+J 
Similarly, 

E ^?i = E {^^+-. + ^)+ E ^%' 

= (ni + n2) ;+J^) + TriL+J + m ^//'^ - 2n, 1+^^^ + n, c.,, + m Z+(^' + Tr(L+J 
And finally, 

E f^^,^ = E f^S^+-«. + f^f; 

zeVaCGa) 2eV3(G3) 



(m + -2) (d^' + it^ - 2 /+w + u., + 1^^^ + 1^^ - 2 /+;^') 
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Summing the three individual terms along with the value of Tr(IjQ ) from (j22l) and substituting the result 
in (p4)) . we obtain the proof. 

D 



9. 6. Proof of Theorem 

We prove Theorem |3] in two steps. First, in the following lemma, we provide a general result for a 
perturbation of a positive semi-definite matrix, which is then used to prove the overall theorem. 

Lemma 3. Given V € 3?"^" is a symmetric, positive semi-definite matrix and X g 3'?"^'^ a perturbation 
matrix, such that (/ + aX'^V'^X) has an inverse where a = {1, —1}, and VV'^X — X , the following holds: 

{V + aXX'^)+ = V+ - aV+X{I + aX^V+Xy^X'^V^ (25) 

9.6.1. Proof of Lemma 

Observe that the positive semi-definiteness of V guarantees that / + X^V^ X has an inverse. 

Let W = V+ - aV+X{I + aX^V+X)-'^X^V+ . Therefore, 

{V + aXX'^)W = VV+ -aVV+X{I + aX'^V+X)-^X^V+ 

+ aXX^V+ - XX^V+Xil + aX^V+X)-^XTV+ 

= VV+-aX{I + aX^V+X)-^X^V+ 

+ aX[l- aX^V+Xil + aX^V+X)-^] X^y+ 

= VV+-aX{I + aX^V+X)-^X^V+ 

+ aX [{{I + aX^V+X) - aX^V+X)(/ + aX^V+X)-^] X^V+ 

= VV+-aX{I + aX^V+X)-'X^V+ 
+ aX [{I + aX^V+X)-'^] X'^V+ 

= VV+-aX{I + aX^V+X)-^X^V+ 
+ aX [{I + aX'^V+X)-'^] X^V+ 

= VV+ 

From this identity, and the symmetry oiV,W h {XX'^), it follows easily that W satisfies the four conditions 
required for a Moore-Penrose pseudo-inverse. 

D 



9.6.2. Rest of the Proof of Theorem\^ 

Note that the firing of the edge e,j in Gi{Vi,Ei) to obtain G'2(V2,-E2), results in the following scalar 
relationships between the Laplacians of the two graphs: 



a- [LcaliJ — [Lcalj 



1 



b. \LG^U = [LG^^ 



c- [LgsIjj - [LgJj 



(26) 



For ease of exposition, we permute the rows and columns in L^^ and Lq^ in such a way that i = 1 and 
i = 2. The above perturbations can then be rewritten as: 



LfJ, — he 





1 


-1 


. 


. 




-1 


1 


. 


. 


1 

^12 








. 


. 







(27) 
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Lg2 is therefore a sum of LiGh & real, symmetric, positive semi-definite matrix, and a rank(l) pertur- 
bation matrix, referred to henceforth as Y. It is easy to see that for a simple, connected, undirected 
graph Gi{Vi,Ei), L^ satisfies all the preconditions in Lemma |31 Substituting V — Ld, a = 1 and 
X = \JY = X^ , in Lemma [31 we get: 

L+^ = (Lg, + XX^Y = L+^ - L+^X (/ + XL+^X)-i XL+^ (28) 

All that remains now is to obtain the scalar form for the term: L^X (/ + X\j^X)^^ ^^%- Note: 



L+ X= , 



7+(l) ;+(l) 



ni 

7 + (l) 



+(1) 

n-1,1 
, + (1) 
'ral 



'12 
7 + (l) 



;+(l) 
'n-1,2 
, + (1) 
'n2 






I'nl 



'12 J 
'22 / 



'n-1.2J 
''nl ) 



(29) 



Similarly, 



Or, 



where ^\^ 



XL 



1 



Gi 



\/2 Wi2 



,+(1) _ ,+(1) 

'll '21 



,+(1) 
'l2 



+ (1) 



22 

'V'll '21 / >.'l2 '22 / 



, + (1) _ , + (1) 
'in '2n 

V'ln '2n J 



XL+ X = 



-(1) 



'11 



+ ; 



-(1) 

22 



-(1) 



'12 



r n^i 


S'l2 


n 


. 


2 ^^12 
"12 


2 "12 
''12 


. 


. 


2 W,2 


2 (^12 








. 


. 








. 


. 



^21 , which yields: 



(30) 



(31) 



1 1 '^12 

"l2 
2 (^12 


i'l2 

2 "I? 

2 CJ12 






















In-2,n-2 





(I+XL+X)-' = 



Multiplying on the left and right sides of the RHS above with Lg X and X'L'^ respectively, the following 
scalar form is obtained: 



2c^l2+nf2l 
2(<^12+n?2M 




"l2 












2 


<^12+npl) 

"i2+!:^J^i 


2(a;i2+n^,i) 


2 


<^i2-i-n^;i) 












-fn-2,n-2 



















+(1) 



, + (2) ^ ,+(1) _ 

xy xy 



rA 



+(l)^^+(l) ;+(!) 



x2 



')in 



2y 



r^v 



(32) 
wing 

(33) 



0J12 -r ^^12 

Substituting i = 1 and j = 2 back into the equation above, we obtain the proof. 

D 
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9.7. Proof of Corollary [^ 

Noting fl^^ = Ixx + lyy — Ixy — lyx and substituting into the result of Theorem [31 we obtain the 
proof. 

D 

9.8. Proof of TheoremY^ 

Deleting a non-bridge edge eij £ Ei{Gi) from Gi{Vi,Ei) to obtain G2{V2,E2), results in the following 
scalar relationships between the Laplacians of the two graphs: 






LgsIjj - [LgJjj - 



1 



(34) 



Once again, for convenience, we rearrange the rows and columns of Lg^ and Lq^ in such a way that i = 1 
and j = 2. Thus, 

'1 -1 ... 



-'Gi 





-1 


1 


. 


. 


f 

^12 








. 


. 







(35) 



The rest of the proof follows as in the proof of Theorem [31 with the following modification. Substituting 
V = Lgj, a = — 1 and X — \/Y — X'^ ^ in Lemma[31 we get: 



L+^ = (Lg, - XX^)+ - L+^ + L+^X (/ + XL+ X)-i XL+^ 
which yields the following scalar form: 

/; + (!) _1 + W\(1 + W _ 7 + (l)\ 
;+(2) _ 7+(l) , '^'a:! 'a:2 A'ly '■2y ) 

UJl2 — iii2 



"xy 



-xy 



(36) 



(37) 



Substituting i = 1 and j = 2 back into the equation above, we obtain the proof. 

D 

9. 9. Proof of Corollary [^ 

Noting fl^^ — Ixx + lyy — Ixy ~ lyx and substituting into the result of Theorem [4l we obtain the 
proof. 

D 



9.10. Proof of Theorem\^ 

We present the proof for the case: x,y E V2(G2) as the other case follows by symmetry. Once again, we 
need a lemma to determine TrCL^ ) in terms of the elements of Lg . 

Lemma 4. Let GilVi, Ei) be a simple, connected, unweighted graph with a bridge edge Cij : i G Ei{Gi) 
which upon deletion produces two disjoint simple graphs G2{V2, E2) and 6*3(^3, i^^a). Then, 



Tr(L 



G2 






(38) 



X<£V2(G2} 



xGV2{G2) yeV2{G2) 
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9.10.1. Proof of Lemma [^ 

The proof follows simply by observing V,^^ — fi^^, V(a;, y) G ^2(6*2) x ^2(^2) and substituting values in 
terms of the elements of Li . 

D 

9.10.2. Rest of the Proof of Theorem\^ 

Follows similarly from the triangle inequality in Lemma [1] by confining to node pairs (a;,?/) G ^2(62) x 

"1^2(^2), and then substituting the result of Lemma |4] and other relevant effective resistance values in terms 

ofL+ . 
(-11 

D 
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